arXiv:1501.00950v2 [cs.IT] 9 Apr 2015 


1 


Influence of Behavioral Models 
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Abstract —In order to characterize the channel capacity of a 
wavelength channel in a wavelength-division multiplexed (WDM) 
system, statistical models are needed for the transmitted signals 
on the other wavelengths. For example, one could assume that the 
transmitters for all wavelengths are configured independently of 
each other, that they use the same signal power, or that they nse 
the same modulation format. In this paper, it is shown that these 
so-called behavioral models have a profound impact on the single¬ 
wavelength achievahle information rate. This is demonstrated hy 
establishing, for the first time, upper and lower bounds on the 
maximum achievable rate under various behavioral models, for 
a rudimentary WDM channel model. 

Index Terms —Achievable information rate, behavioral models, 
channel capacity, multiuser communications, mutual information, 
network information theory, nonlinear Interference, wavelength- 
division multiplexing. 


I. Introduction 

One of Shannon’s most significant contributions was the 
definition of the channel capacity as the highest achievable 
throughput (in bit/symbol or bit/s/Hz) of a given commu¬ 
nication channel, at an arbitrarily low error probability IT]. 
He furthermore showed that a capacity-achieving transmission 
scheme can operate by transmitting discrete-time symbols 
generated from a suitably chosen input distribution, if certain 
conditions are imposed on the allowed sequences of symbols. 

In a more practical setting, the symbols correspond to 
pulses, the input distribution to a modulation format, and the 
allowed sequences of symbols to an error-correcting code. 
The maximum throughput that can be achieved with the best 
possible error-correcting code is, for a given channel and 
a given input distribution, given by the mutual information 
m Ch. 2, 7]. This quantity can be expressed as a (possibly 
complicated but still explicit) integral over the joint distribu¬ 
tion of the transmitted and received symbols. Thus, it is a 
function of the channel and the input distribution. To obtain 
the channel capacity, which is a function of the channel alone, 
the mutual information should therefore be maximized over all 
possible input distributions (or modulation formats). Neither 
this maximization nor the mutual information integral admit 
analytical solutions in general, and the exact channel capacity 
is therefore known only for a few specific channels, of which 

This work was supported in part by the Swedish Research Council (VR) 
under grants 2012-5280 and 2013-5271. The material in this paper was 
presented in part at the Optical Fiber Communication Conference (OFC), 
Anaheim, CA, Mai'. 2013. 

E. Agrell is with the Dept, of Signals and Systems, Chalmers Univ. of Tech¬ 
nology, SE-41296 Goteborg, Sweden, email agrell@chalmers.se. M. Karlsson 
is with the Dept, of Microtechnology and Nanoscience, Chalmers Univ. of 
Technology, SE-41296 Goteborg, Sweden. 


the additive white Gaussian noise (AWGN) channel is the most 
well known. This implies that for most practical channels, the 
capacity is only known in terms of upper and lower bounds. 

For the coherent fiber-optic channel, the AWGN channel 
model is a good starting point, due to the amplified spon¬ 
taneous emission (ASE) noise in optical amplifiers, but the 
nonlinearities of the optical fiber will make this channel model 
inaccurate for sufficiently high signal powers. Assuming the 
added ASE noise variance Pase to be fixed and known, the 
question is, how will the channel capacity C{P) behave as 
a function of the signal power PI There is a common and 
reasonable belief 0-0 that the nonlinearity will somehow 
limit the available capacity for fiber links, but the question is 
to what extent. 

Eor the single-wavelength channel, the capacity was pio¬ 
neered in 0, where it was shown to reach a maximum and 
then decay as the signal power increases, and more recently 
referred to as the “nonlinear Shannon limit” 0, 0. However, 
more or less all such plots formally represent lower bounds 
on the channel capacity, as pointed out, e.g., in a, 0-aioi, 
since they are obtained from analysis over a finite set over all 
possible input distributions or using suboptimal (mismatched) 
receivers. It is possible to show that the channel capacity will 
not decay at high signal powers, provided that a sufficiently 
exhaustive search over input distributions is carried out at each 
signal power level HD, na. Moreover, it can be shown that 
the use of a finite-memory channel model will also raise the 
lower capacity bounds at high signal powers to nonzero values 

US). 

In this paper, which is an extension of M, we will deal 
with the capacity of multichannel systems, e.g., wavelength- 
division multiplexed (WDM) optical links, for which the 
situation is more subtle. The current paradigm in optical 
multiuser communications 0-0,0, is to analyze 

the capacity of a single user in the system, say user 1, 
assuming that the other users are outside our control. We will 
therefore call user 1 the primary user and the other users, 
whose transmissions cause interference to user 1, interferers. 
More formally, the quantity of interest is the achievable in¬ 
formation rate Ci = sup/(Ai;yi), where denotes 

the mutual information between the input Xi and output Yi 
of subchannel 1, and the maximization is over all possible 
input distributions (modulation formats) fxi ■ These quantities 
will be mathematically defined in Sec. El where it is also 
remarked that Ci is in general not a channel capacity in 
the information-theoretic sense. It is instructive to contrast 
with wireless multiuser systems, where the transmitters are 
typically designed jointly (but possibly operated separately). 
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and the relevant capacity measure is a multidimensional object, 
the capacity region, which describes the set of achievable 
throughputs for all users simultaneously (27], El Ch. 15], EH 
Ch. 6], 

Two kinds of models are needed to fully describe a mul¬ 
tiuser system as a single-user channel model Xi Yi, 
as illustrated in Fig. [T] the first is a discrete-time multiuser 
channel model, which gives the statistics of the channel 
outputs Yi,..., Ym as functions of the inputs Xi,..., Xm, 
and the second is a behavioral model, which relates the 
interferers’ distributions /x 2 i---i/xm to the primary user 
input distribution fxi ■ Obviously, fxi needs to be optimized 
for the considered multiuser channel model in order to attain 
the channel capacity, but how shall the interferers, which 
cause interference to the primary user, behave during this 
optimization process? Will they be passive, or are they allowed 
to adapt their signaling power and/or modulation format to the 
power and/or modulation format of the primary user? These 
questions are usually not explicity adressed in the majority of 
papers on optical multiuser capacity. The notable exception 
is the work by Taghavi et al. ED, where both the capacity 
region and some bounds thereon were defined for a WDM 
system model, based on a Volterra approach. Their main 
conclusion (based on simulations of a simplified, nonlinear 
channel model) was that Ci (P) is unbounded if the receiver 
could use multiuser detection to cancel nonlinear interference, 
and saturated (monotonically) to a constant value in the special 
case of increasing all user powers P simultaneously. 

In this paper, we discuss and classify the different behavioral 
models used in the literature, and give an illustrative example 
of multiuser capacity for a simple nonlinear optical channel 
model, together with some general conclusions on how the 
selected behavioral model for the interferers influences Ci (P). 
Although the idealized channel model is not fully realistic, 
it serves the purpose of exemplifying, for the first time, 
the profound impact of behavioral models on the nonlinear 
channel capacity. The paper is organized as follows. In Sec. HD 
the multiuser nonlinear channel model is described and its 
parameters are defined. The behavioral models are defined 
in Sec. imi where we also attempt to classify the behavioral 
models considered in earlier optical channel capacity studies. 
After mathematically defining the channel capacity and related 
quantities in Sec. |IV] upper and lower bounds are derived in 
Sec. |V] and |Vl] resp. The obtained bounds are plotted and 
discussed in Sec. IVIII The paper concludes in Sec. IVIIII with 
a discussion about the validity of the results and their potential 
extensions to more realistic optical channel models. 

We use uppercase notation X for random variables and 
lowercase x for deterministic variables. Probability density 
functions are denoted as fx{x) and conditional probability 
density functions as fY\x{y\x), where the subscripts will 
sometimes be omitted if they are clear from the context. 

II. System model 

In order to exemplify the information-theoretic nature of 
various behavioral models in optical communications, we need 
a simple, yet nontrivial, channel model for a WDM link. 



Fig. 1. A single-user channel model can be seen as a combination of 
a multiuser channel model and a behavioral model for all users but one. 
Transmitter and receiver are marked Tx and Rx, respectively. 


which enables analytical and numerical calculations of upper 
and lower bounds on the achievable rates. Linear modulation 
is used in the transmitter, and the receiver applies coherent 
matched Altering and sampling. We select a simplified model 
with three equispaced WDM channels enumerated by z = 
1,2,3. For simplicity, we assume that four-wave mixing dom¬ 
inates over self- and cross-phase modulation. This scenario 
arises, e.g., when the generalized phase-matching condition 
is fulfilled E9l . The dispersion and the nonlinearity are both 
assumed weak, which means that the nonlinear phase shift 
</>NL ^ 1- Under these assumptions, the coupled nonlinear 
differential equations can be linearized in propagation distance 
by a perturbative analysis. A detailed discussion and the full 
set of coupled equations for this situation can be found in E9l . 
We And that the complex discrete-time output signals Yi are 
given by a nonlinear channel model according to 

Yi =Xi+eX|X3*+Wi, (1) 

Y2=X2 + 2eXiX;X3 + N2 , (2) 

Y^=X^ + eXlXl + N^, (3) 

where Xi are independent, complex channel inputs and Ni are 
independent, complex, circularly symmetric, white Gaussian 
noise signals, each with zero mean and equal variance. The 
indices in ([T]i-@ are the same as in EtI Eq. (8)], (Ml Eq. (6)], 
confining the WDM system to 3 wavelengths and ignoring 
self- and cross-phase modulation terms. Similar models were 
derived in the context of noncoherent WDM systems with on- 
off keying modulation ED, EH- As in ED and other works, 
our intention is not to present an accurate channel model, 
but rather the opposite: We wish to use the simplest possible 
nonlinear WDM model that will allow us to qualitatively 
compare different behavioral models. 

In this work, we consider the single-wavelength detection 
scenario, as it was defined in E3. This means that each 
receiver i receives its own signal Yi, with no information about 
the other received signals Yj for j ^ i. Eurthermore, receiver 
i knows the distributions fx^ of the other users j ^ i, but not 
their codebooks. Hence, multiuser detection IZTl . (Ml , (341 is 
possible, but not simultaneous decoding (Ml Ch. 6]. 
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The channel model O-® is characterized by two param¬ 
eters, e and Pase = ]E[|7Vip]. In an n-span amplified link, 
the single-polarization noise variance (power) equals Pase = 
nnsp{G — l)hvB, where n^p is the spontaneous emission 
factor, hv the photon energy, G the gain of each amplifier, 
which also equals the span loss, and B the signal bandwidth. 
The constant in ([1])-® is e = where 7 is the fiber 

nonlinear coefficient and Left the effective nonlinear amplifier 
span length, related to the physical amplifier separation L via 
^eff = (1 ~ ex-p{—aL))/a with a being the fiber attenuation 
coefficient. One may improve the model by multiplying e 
with a complex factor depending on the phase mismatch, 
attenuation factor, and span length, but we neglect this for 
simplicity. 

For the numerical examples in Sec. lYni the following 
parameters will be used. We consider a link with n = 16 
amplifier spans. The gain of each is G = 30 dB and the 
spontaneous emission factor is risp = 2. The signal bandwidth 
is P = 40 GHz, and with hi^ = 0.128 aJ, 7 = 1.6 km~^, 

and Left = 24 km, we get Pase = 0.16 mW and e = 610 
W“^. The condition (j)NL = eP 1, where P is the signal 
power, translates to P <C 1.6 mW, or a signal-to-noise ratio 
of P/Pase < 10 dB. We will apply this model, which was 
derived under a weak nonlinearity assumption, also in the 
strongly nonlinear regime, which although inaccurate is the 
conventional approach in the literature. 

III. Behavioral models in multiuser 

COMMUNICATIONS 

Whenever a multiuser system is characterized by means of 
a single-user channel capacity, the results are connected to a 
certain behavioral model, as discussed above. The behavioral 
models relate the input distributions of the interferers to 
the primary input distribution. We study three fundamentally 
different classes of behavioral models; 

(a) Fixed interferer distributions. The interferer distributions 
/jfs, ■ • ■, fxu remain the same regardless of fx^ ■ The 
dashed arrow from Tx 1 in Fig. [T]does not exist in this 
case. From the viewpoint of information theory, this is a 
single-user channel. 

(b) Adaptive interferer power. All users transmit with the 
same power Pi = P2 = P3, but not necessarily the same 
distributions. The interferer distributions fx 2 , ■ • ■, Jxm 
are fixed apart from a scale factor, which depends on Pi. 

(c) Adaptive interferer distribution. All users transmit with 
the same distribution and the same power, fxi = fx^ = 
fxs- 

The channel models used for WDM capacity analyses in 
the literature fall in categories (b) and (c). Model (b) was 
used by Wegener et al. ll20l . where on-off keying modulation 
was assumed for the interferers ll20l Eq. (15)], and a Gaussian 
pdf assumed for the primary user, although all users had the 
same power. Behavioral model (b) was also considered in ll2^ 
Fig. 2(b)], where the influence of interferer distributions on 
the achievable rate of the primary user was studied. It was 
concluded that Gaussian interferers caused worse interference 
than quadrature phase-shift keying (QPSK) and ring-shaped 


modulation, when the primary user applies Gaussian modula¬ 
tion at the same power level as the interferers. Model (c) was 
used in ®, Q, 0, Ea, Ga-iiTi, where it was explicitly 
stated that every channel had the same modulation and power. 
Multilevel ring-shaped modulation was used in 0, Q, 0, 
li22l . Ii25l . four different modulation formats were used in 
li26l Fig. 2(a)], and Gaussian modulation for all channels was 
used in ll27l . Quite a few studies have used models of the 
nonlinear interference that does not depend on the choice of 
modulation format, but only on the power spectral density of 
the interferers. Then the modulation of the interferers has not 
been specified, and the chosen behavioral model can be either 
(b) or (c). This applies to 0, 0, IBl-llIll, GT], ED, ED- 
As will be demonstrated in the following, the achievable 
rates may vary significantly between behavioral models. 


IV. INEORMATION THEORY 

The mutual information between two random variables X 
and Y with joint distribution fx.Y and marginal distributions 
fx{x) = J fx,Yix,y)dy and /y(y) = ] fx,Y{.x,y)dx is 
defined as El Eq. (2.35)] 

= JJ fx,Y{x,y) log 2 

where the integral is over the domain of X and Y. If one 
or both of X and Y are discrete, their distributions are re¬ 
placed with probability mass functions and the corresponding 
integrals are replaced with sums. Similarly, the conditional 
mutual information between X and Y given another random 
variable Z is defined as El Eq. (2.61)] 


I{X;Y\Z) 


fx,Y\zix,y\z)log2 


fx,Y\z{x,y\z) 

fx\z{x\z)fY\z{y\z)) 


dxdydz. 


If X and Y are the input and output, resp., of a communication 
channel, then the joint distribution can be separated into the 
product fx.y{.x,y) = fx(.x)fY\x{y\x), where fx denotes 
the input distribution and fY\x denotes the channel. Thus, 
the mutual information depends on both the input distribution 
and the channel. More precisely, the mutual information gives 
the highest achievable rate, in bit/symbol, of a given channel 
and a given input distribution, if strong coding is allowed over 
long blocks of symbols. As discussed in the Introduction, the 
channel capacity is 


G = sup/(A:;y), (5) 

fx 

which is a function of the channel only, not of the input 
distribution. From a practical viewpoint, the optimization over 
input distributions in ® can be regarded as an optimization 
over modulation formats. 

In the multiuser scenario considered in this paper, we are 
interested in the channel capacity of one subchannel. Inspired 
by ®, one can define 


G,(PQ= sup I{XY,Yi), (6) 

fxi.n\xy]=Pi 
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where Pi = E[|Xip] = J \x\‘^fxi{x)dx is the signal power 
of subchannel i. This quantity is an achievable rate of sub¬ 
channel i and has been studied in numerous publications 
in optical communications. It is often called the channel 
capacity, although, strictly speaking, the single-user channels 
in Fig. [T] have no channel capacity in an information-theoretic 
sense, since Shannon’s channel coding theorem, according to 
which (|3 gives the maximum achievable rate of the channel 
described by fY\x, assumes the channel law fY\x to remain 
the same throughout the maximization. This is not the case 
in (| 6 l), where I{Xi\Yi) relies on a channel law /v-|Xi that 
changes with fxi and/or Pi, according to the behavioral 
models that control the input distributions fx^ for j ^ i. 

In this paper, we wish to evaluate C'i(Pi) for the behavioral 
models in Sec. imQ As usual in nonlinear information theory, 
it seems infeasible to find an exact expression, but we can 
follow the standard approach and sandwich the achievable 
rates between upper and lower bounds. No approximations 
are involved in the derivations of these bounds. 


Hence, the right-hand side of (|9]l equals the AWGN channel 
capacity log 2 (l+Pi/Pase), independently of X 2 and x^, which 
completes the proof. □ 

Alternatively, the theorem can be derived from ([T]) via the 
data-processing inequality ^ Th. 2.8.1]. 

Theorem |2] holds for any distributions of X 2 and X 3 , and 
therefore for any behavioral models. For certain behavioral 
models, the bound can be tightened using the next theorem. 

Theorem 3: If X 2 and X 3 are zero-mean, circularly sym¬ 
metric Gaussian (ZCG), then 


Ci(Pi)<^^“e-“/^Mog2 (1 + 


P^ 




du 


Proof: Invoking Lemma [T] this time conditioning on X 2 
only, yields 


C^{P^)<s^xY>I{X^■Y^\X2) 

= sup [ f{x2)I{Xi;Yi\X2 = X2)dx2 

Jc 


V. Upper bounds 

Our upper bounds on Ci depend on the following funda¬ 
mental lemma. 

Lemma 1: If X and Z are independent, then 
IiX-,Y) < I{X-,Y\Z) 

Proof: From E Eq. (2.119-120)], 

I{X- Y\Z) = I{X-Y) + I(X- Z\Y) - I{X- Z) (7) 
= I{X-Y) + I{X-Z\Y) (8) 

>I{X;Y), 

where (j?]) follows from the independence of X and Z and ([Sll 
from the nonnegativity of conditional mutual information ID 
Eq. (2.92)]. □ 

If X and Z are not independent, the Lemma does not hold. 
A notable example is when X ^ Y ^ Z forms a Markov 
chain, in which case I{X]Y\Z) < I{X;Y) follows by the 
data-processing inequality ||2l Eq. (2.122)]. 

Eor the specific channel model ([T]i, the lemma can be used 
to derive two upper bounds. 

Theorem 2: Eor any distributions of X 2 and X^, Ci is 
upperbounded as 

Cl(Pl)<l 0 g 2 

\ -^ase / 

Proof: From (|6l) and Lemma [T] 

C'i(Pi)< sup I{Xx-,Yx\X 2 ,Xz). (9) 

/xi:E[|JtiP]=Pi 

Given X 2 = X 2 and X 3 = X 3 , O is an AWGN channel 
with a constant offset ex 2 X 3 . If this offset is known, it can 
be subtracted at the receiver, resulting in a regular zero- 
mean AWGN channel with noise variance E[|A^ip] = Pase. 

* A similar analysis can be carried out for subhannels 2 and 3. By symmetry, 
CslPs) is equivalent to Ci(Pi), whereas C 2 (P 2 ) is different. Some of 
the bounds in Sec. ID and ISI extend straightforwardly to C2 as well (e.g., 
Theorems |4] and [ 5 ), whereas other bounding techniques, tailored to 0 , would 
be needed for a full characterization of C2(P2)- 


< [ f{x 2 )supI{Xi-,Yi\X 2 = X 2 )dx 2 , (10) 

Jc 

where the suprema are over all fxi such that E[|Arip] = Pi. 
If X 3 is Gaussian, then O conditioned on X 2 = X 2 is a 
zero-mean AWGN channel, because its two noise contributions 
and Ni are both Gaussian. The power of ex^X^ is 
e^|a; 2 |^P 3 , while the power of Ni is Pase as before. Hence, the 
supremum in (fTOl i equals the capacity of an AWGN channel 
with power Pase -b e^|a; 2 |'‘P 3 , 


C'l(Pl)< / /(X 2 )l 0 g 2 
Jc 



L _) 

e^|a;2|'‘P3/ 


dX 2 - 


( 11 ) 


This bound can be simplified by using the circular symmetry 
of 


f{x 2 ) = 

ttPq 


,-k 2 |V-P 2 


Let U = |Ar 2 p. Then U is exponentially distributed. 


f{u) 


1 


g-VP 2 ^ 


u > 0 . 


( 12 ) 


The theorem now follows by changing the integration variable 
in (HB from X 2 to u = jeep. □ 


VI. Lower bounds 

Since the channel capacity is the supremum of mutual 
information, a lower bound on capacity can be obtained 
from the mutual information for any given input distribution. 
Analogously, from (| 6 l). 


Ci(Pi)>/(Xi;ri) (13) 

for any input distribution fxi with power Pi. In this section, 
we will obtain lower bounds on C'i(Pi) via (fTST l. 

If all input distributions are discrete, it is feasible to calcu¬ 
late the right-hand side of (fTsT i by numerical integration, using 
either of the following two theorems. 
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Theorem 4: If Xi, X 2 , and X 3 are all discrete, uniformly 
distributed over complex constellations Xi, X 2 , and A 3 , resp., 
then 


I{X,;Y,)=E 



fivilxiY 
fivi) \ ’ 


(14) 


where 


f{yi\xi) = 


1 


exp 


E E 

X 2 GPY 2 

\yi -xi- 


/(yi) = E fivYxi)- 


xiGA’i 


Proof: From O- 


(15) 

(16) 


/(2/i|xi,X2,X3) 


1 

TT P 
/' 4 - ase 



1^1 - xi - exlxl\ 

p 

J- ase 


(17) 


Marginalizing f{yi\xi, X 2 , X 3 ) yields fiyi\xi) and f{yi). 
Finally, (fl4l i follows by rewriting (|4|i. □ 

Theorem 5: If Xi is ZCG and X 2 and X 3 are discrete, 
uniformly distributed over complex constellations X 2 and A 3 , 
resp., then 

fijjiWY 


I(Xi-YY=E 


log 2 


/(yi) J ’ 


where 


f{yi\xi) = 


/(yi) = 


1 


7rPase|A’2||A’3| 

exp 


E E 

X2^P<^2 ica^A'a 

\yi -xi- exlxlY 


Pa. 


1 


7r(Pl+Pase)|A’2||A’3 

exp I - 


E E 


X 2 GA '2 

\yi - exlxlY' 


(18) 


(19) 


( 20 ) 


Pi + Pase 

Proof: In ([T]i, Xi + Ni is ZCG with variance Pi + Pase, 
which yields 


f{yi\x2,X3) 


1 

—T 

7''(Pl + Pase) 


\yi - exlxlY \ 
Pi + Pase / 


Marginalizing this distribution with respect to X 2 and X 3 
yields /(yi) in (l20l i. Equation ( fT^ is proved as in the proof 
of Theorem m which completes the proof of (fTSl l. □ 

In Sec. IVIII the expectations in (fl4li and (fT^ will be 
evaluated by Monte-Carlo integration to obtain lower bounds 
on Cl via (fT3]) . Theorem 0] applies to all three behavioral 
models, as long as the interferer distributions X 2 and X 3 are 
discrete, whereas Theorem |5] applies to some cases of models 
(a) and (b). 

Theoretically, Theorems 0] and |5] can be modified to hold 
also when at least one of the input distributions is continuous. 
In this case, the corresponding sums in the expressions for 
f{yi\xi) and f{yi) will be replaced by integrals. However, 
these integrals cannot in general be evaluated analytically. 
This causes numerical problems in (fT4l i and (fTSl l. where the 


Monte-Carlo estimate of the expectation may become grossly 
inaccurate if /(yi|a;i) is not exact. Applying Monte-Carlo 
integration inside another Monte-Carlo integral should be 
avoided if at all possible. Therefore, we wish to find other 
lower bounds on the mutual information. To this end, the 
following lemma, due to Emre Telatar, is useful. It was stated 
and proved in a, eqi, and it can also be obtained as a social 
case of the auxiliary-channel lower bound II35I Sec. VIJj- 
Lemma 6 : Let Xq and Yq be complex, dependent, jointly 
Gaussian random variables. Let Y be any complex random 
variable (possibly non-Gaussian) such that 

E[|yp]=E[|FG|"], 

E[Y*Xg]=E[Y^Xg]. 

Then 


I{Xg-,Y)>I{Xg;Yg). 


The next lemma gives the mutual information of two com¬ 
plex, jointly Gaussian variables. It is proved by straightforward 
evaluation of the integral in (01i; see, e.g., ll^ Eq. (9-8)]. 

Lemma 7: If Xg and Yg are complex, jointly Gaussian 
variables with zero mean, variances E[|XgP] = tr^ and 
®[|^Gp] = CTy. resp., and covariance = sxy, then 

their mutual information is 


I{Xg;Yg) = \og2 


2 2 
<XxCrY 

- Yxy\^ ' 


The preceding two lemmas make it possible to prove the 
following lower bound. 

Theorem 8 : Eor any zero-mean interferer distributions fx 2 
and /x 3 . 


Proof: Combining ( fT3l l with Lemmas | 6 ] and [T] yields 

Cl (Pi) > log 2 (22) 

Picrf - |sxyr 


where 


4 = E[|yin, 

=E[Xiy;], 

and Yi is given by O for a ZCG input distribution Xi. Using 
the independence of ATi, X 2 , and X 3 , 

aY =E[\Xi + eXlx;+NiY] 

= E[|Xi|2] + e2E[|X2|4]E[|X3p] + E[|7 Vi|2] 

= Pi+e2p3E[|X2|4]+P,,e, (23) 

sjfy = E[Xi{Xi + eXix; + TVi)*] 

= E[|Xi| 2 ] 

= Pi. (24) 

The theorem now follows by substituting (l2^ - (l24l i into (l22l i 

and simplifying. □ 


^To see this, substitute X = Xq, p{x) = pq{x), q{y\x) = 
VG{x,y)/pG{.x), and qp{y) = paiv) in OS Eq. (34)]. 
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The right-hand side of (1211 1 depends on the statistics of X 2 . 
For example, if X 2 is discrete, uniformly distributed over a 
constellation X 2 , then 


e[|^2|1 


1 

1 ^ 


E 1 ^ 1 ^- 

x£X2 


(25) 


In the special case of a phase-shift keying (PSK) constellation 
with power P 2 , (l25]) simplifies into E[|X 2 |^] = P^- 

On the other hand, if X 2 is ZCG, then E[| 2 f 2 |^] can be 
calculated by setting X 2 = X^ -f jX^, where j = 1/—1 and 
X^ and Xi are real, independent, Gaussian variables with zero 
mean and variance cr^ = P 2 / 2 . Then 

E[\X2\^]=E[\X,+jX,\‘^] 

= E[X^] + E[Xi4] -f 2E[Xf]E[X^] 

= + 3(7^ -f 20-2(72 (26) 

= 2P2^ (27) 



where ( l26l l follows from a standard result in mathematical 
statistics 1321 Eq. (5-46)]. 

Theorem 0 will be used in the next section to lower-bound 
Cl in certain cases when the interference is governed by 
behavioral models (a) or (b). 

VII. Results 

In this section, the bounds of Sec.lVIlandlVlare numerically 
evaluated for the multiuser channel ([T]i-(0, using the param¬ 
eters e and Pase as specified in Sec. HI] Fig. |2] (a)-(c) illustrate 
via upper and lower bounds the single-user achievable rates 
C'i(Pi) = sup/( 2 fi; Yi), where the maximization is over all 
distributions /xi with power Pi, combined with the three 
behavioral models in Sec. [Ill] For models (a) and (b), the 
interferer distributions /xa and fx^ are either uniform over 
a QPSK constellation or Gaussian, which in total gives five 
scenarios. We will discuss the three models separately below. 

A. Behavioral model (a)—fixed interferer distributions 

With behavioral model (a), the interferer distributions /xa 
and /x 3 are fixed and do not change with fxi- The interfer¬ 
ence power is also fixed at a level of P 2 /Pase = Ps/Pase = 
5 dB. The applied bounds are different depending on the nature 
of the interferers; If X 2 and X^ follow QPSK distributions, 
then we obtain an upper bound from Theorem |2] and a lower 
bound from Theorem |5] or where Monte Carlo integration 
was used to estimate the expectation in (fTsl l. The two lower 
bounds turn out to be numerically indistinguishable; in Fig. |2] 
(a). Theorem |5] is plotted. On the other hand, if X 2 and X 3 
follow Gaussian distributions, our upper bound is given by 
Theorem [3 and the lower bound by Theorem | 8 ] and (l27l i. 

The upper and lower bounds follow each other and together 
prove that the achievable rate increases to infinity if the 
signal power can be increased arbitrarily. This result is not 
surprising, since Xi dominates over the two other terms in ([Til 
at sufficiently high power Pi. The channel is in fact a single- 
user channel, described by a fixed distribution fYi\Xi, and 
the channel capacity is nondecreasing for all such channels, 
linear or nonlinear II2. The capacity is larger in the case of 




Fig. 2. The achievable rates C'i(Pi) of user 1 in a WDM system, with the 
three behavioral models (a), (b), and (c), defined in Sec. nni as a function of the 
signal power Pi. Dashed lines give upper bounds and solid lines lower bounds. 
Shaded regions indicate the amount of uncertainty. Behavioral models (a) and 

(b) both have two versions, depending on the type of interferer distributions. In 

(c) , the lower bound is obtained as the envelope of multiple bounds, indicated 
with gray curves. Dotted vertical lines correspond to curves in Fig. [3] 
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discrete input distributions for the interfering channels than in 
the Gaussian case, but the capacity follows the same general 
trend in both cases. 

B. Behavioral model (b)—adaptive interferer power 

With behavioral model (b), the power of all users is the 
same, but the distributions may be different. The same upper 
and lower bounds as in Fig. |2] (a) are plotted in Fig. |2] (b): 
Theorems |2] and |5] with QPSK interference and Theorems [3 
and [ 8 ] with Gaussian interference. With this behavioral model, 
the achievable rate of the primary channel is fundamentally 
different depending on the nature of the interference. If 
the interferers’ distributions are discrete, the achievable rate 
increases with power towards infinity. This can be intuitively 
understood as follows. The magnitude of the interference term 
will, at high enough power Pi = P2 = P3, be 
much larger than Xi or Ni. Hence, receiver 1 can detect the 
value of eX^X^ with high reliability (only four values are 
possible in the QPSK case) and subtract this value from the 
received signal Yi. After this so-called interference cancella¬ 
tion, subchannel 1 is effectively an AWGN channel Xi Ni, 
whose capacity is log 2 (l + Pi/Pass)- This is the reason why 
the two bounds converge near Pi/P^se = 14 dB and above. 
However, no similar receiver strategy is possible if X 2 and X 3 
are Gaussiar@, because then eX^X^, which has a continuous 
distribution with large variance, effectively drowns the weaker 
contribution from Xi. Therefore, this achievable rate has a 
peak at a moderate power, after which it decreases towards 
zero for very high power, as seen in Fig. |2] (b). 

C. Behavioral model (c)—adaptive interferer distribution 

To obtain a lower bound with behavioral model (c), i.e., 
when all users apply the same input distribution, we apply 
Theorem |4] with a suitably chosen input distribution fxi = 
fx 2 = fxs ■ For the same reasons as in Fig. |2] (b), a discrete 
input distribution is advantageous when the interference is 
strong. We therefore consider M-PSK constellations with 
uniform probabilities and choose the integer M suitably, as 
described in the following. 

The bounds with behavioral model (c) are illustrated in 
Fig. |2] (c). The upper bound is again Theorem |2] The lower 
bound is obtained from Theorem|4]as discussed in the previous 
paragraph. Each M = 2,..., 16 gives rise to one lower bound, 
indicated in gray. As visible in the bottom right of the figure, 
each of these bound converge to log 2 M at high power. This 
can be understood as follows. As explained in Sec. IVII-BI the 
interference term in O can be reliably detected by 

reciever 1 and subtracted from Yi. This holds for any discrete 
constellation at sufficiently high power. After interference 
cancellation, the effective channel is again Xi + iVi, whose 
mutual information with a uniform M-PSK input distribution 
is asymptotically log 2 M. Hence, for every M, there exists 
a power threshold above which the lower bound is arbitrarily 

^As stated in Sec. mi no receiver knows any of the other subchannels’ 
codebooks. If these codebooks were known, the interference can be detected 
and substracted even for Gaussian X 2 and X 3 . Sec. 15.1.5]. 



Fig. 3. The mutual information according to Theorem |4] for M-PSK 
constellations with uniform probabilities, for the indicated values of Pi/Pase- 
The peak of each curve yields the lower bound in Fig. |2](c). 


close to log 2 M. This proves that the envelope of these bounds, 
shown in black in Fig. |2] (c), grows unboundedly. 

The optimization process is illustrated in Fig. [2 which 
shows the mutual information I{Xi]Yi) according to Theo¬ 
rem |4] as a function of M = 2,..., 16, for selected values 
of Pi/Pase- At low signal power, the mutual information 
is practically the same for any M-PSK constellation (and 
actually for any zero-mean distribution, including Gaussian), 
whereas the optimal M tends to increase with power in the 
nonlinear regime. We know for sure that M-PSK are not 
optimal constellation^ but they suffice to show the qualitative 
trend of the achievable rate; It again grows with increasing 
power towards infinity. This result is significantly stronger 
than the theoretical prediction for this behavioral model with 
arbitrary channel models IfT^ . which only states that the 
achievable rate is nondecreasing. 

VIII. Conclusions and discussion 

Multiuser information theory, or network information the¬ 
ory, is still in its infancy. In the information theory litera¬ 
ture, the most common approach is to study the multiuser 
capacity region, i.e., the set of achievable rates for all users 
simultaneously. In this work, however, we followed the most 
common approach in optical communications, which is to 
study the channel capacity of a single user in the system. 
More specifically, we considered the achievable rate of a 
single wavelength in a multiuser WDM system, assuming 
certain behavioral models for the transmission on the other 
wavelengths. 

For behavioral models (a) and (c), the achievable rate is 
unbounded with the signal power. With model (b), however, 
the outcome depends crucially on the distributions on the inter¬ 
fering channels; the achievable rate may increase indefinitely, 
as with the other behavioral models, or it may decrease to zero 
as the signal power increases. These results were obtained by 
analytically deriving both upper and lower bounds, in contrast 

"^E.g., a satellite constellation m would improve the lower bound, at least 
in the range between 6 and 11 dB. 
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to most previous works, which have studied lower bounds 
alone. 

On a theoretical level, the most important conclusion in this 
paper is that the results depend strongly on the assumed be¬ 
havioral model. We emphasize that whenever a single-channel 
model is derived for a multiuser system, there is always an 
underlying behavioral model involved. However, despite their 
signihcance, behavioral models have not yet received much 
attention in optical communications. Our recommendation 
to everyone working with the capacity of such single-user 
channel models is to clearly state and justify the behavioral 
model, because it has such a profound impact on the end 
results. 

On a more practical level, the main message is that un¬ 
bounded capacity growth is indeed possible, under some 
specific conditions: (i) The interferers use discrete constel¬ 
lations; (ii) the channel model depends on the actual signals 
transmitted by all users, not on the statistical properties of 
signals 113, Eol; (iii) the symbol clocks of different users are 
synchronized; and (iv) the receiver applies multiuser detection 

El, El, JH. 

The results were computed for a dispersionless three-user 
WDM model O-©, derived in the weakly nonlinear regime. 
Despite its simplicity, this channel model serves to illustrate 
the fundamental differences between behavioral models. Fu¬ 
ture work may involve extending the channel model to the 
strongly nonlinear regime or accounting for dispersion, more 
users (wavelength channels), or dual polarization. It is not 
known to which extent the conclusions above extend to such 
more realistic channels. 
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